[SPARK-40907][PS][SQL] PandasMode should copy keys before inserting into Map#38385
Closed
zhengruifeng wants to merge 1 commit intoapache:masterfrom
Closed
[SPARK-40907][PS][SQL] PandasMode should copy keys before inserting into Map#38385zhengruifeng wants to merge 1 commit intoapache:masterfrom
PandasMode should copy keys before inserting into Map#38385zhengruifeng wants to merge 1 commit intoapache:masterfrom
Conversation
HyukjinKwon
approved these changes
Oct 25, 2022
itholic
approved these changes
Oct 25, 2022
Contributor
itholic
left a comment
There was a problem hiding this comment.
Not strong feeling about my nit comment, LGTM.
Comment on lines
+6050
to
+6054
| rdd = self.spark.sparkContext.parallelize( | ||
| [ | ||
| 1, | ||
| ], | ||
| 4, |
Contributor
There was a problem hiding this comment.
nit: can we just is one or two line some thing like:
rdd = self.spark.sparkContext.parallelize([1], 4)
.mapPartitionsWithIndex(f)?? I suspect it's maybe adjusted by black script tho, 😂
Contributor
Author
There was a problem hiding this comment.
it was just reformated by the script😅
Member
|
Merged to master. |
SandishKumarHN
pushed a commit
to SandishKumarHN/spark
that referenced
this pull request
Dec 12, 2022
…into Map ### What changes were proposed in this pull request? Make `PandasMode` copy keys before inserting into Map ### Why are the changes needed? correctness issue similar to apache#38383, make it a separate PR since it is dedicated for Pandas API ``` In [24]: def f(index, iterator): return ['3', '3', '3', '3', '4'] if index == 3 else ['0', '1', '2', '3', '4'] In [25]: rdd = sc.parallelize([1, ], 4).mapPartitionsWithIndex(f) In [26]: df = spark.createDataFrame(rdd, schema='string') In [27]: psdf = df.pandas_api() In [28]: psdf.mode() Out[28]: value 0 4 In [29]: psdf._to_pandas().mode() Out[29]: value 0 3 ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added UT Closes apache#38385 from zhengruifeng/ps_mode_fix. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Make
PandasModecopy keys before inserting into MapWhy are the changes needed?
correctness issue similar to #38383, make it a separate PR since it is dedicated for Pandas API
Does this PR introduce any user-facing change?
No
How was this patch tested?
added UT